perm filename ARRAY[COM,LSP] blob sn#679549 filedate 1982-10-01 generic text, type T, neo UTF8
Simple Switch Proposal
∂17-Sep-82  1318	Scott E. Fahlman <Fahlman at Cmu-20c> 	Revised array proposal (long)  
Date: Thursday, 16 September 1982  23:27-EDT
From: Scott E. Fahlman <Fahlman at Cmu-20c>
To:   common-lisp at SU-AI
Subject: Revised array proposal (long)


Here is a revision of my array proposal, fixed up in response to some of
the feedback I've received.  See if you like it any better than the
original.  In particular, I have explictly indicated that certain
redundant forms such as MAKE-VECTOR should be retained, and I have
removed the :PRINT keyword, since I now believe that it causes more
trouble than it is worth.  A revised printing proposal appears at the
end of the document.

**********************************************************************

Arrays can be 1-D or multi-D.  All arrays can be created by MAKE-ARRAY
and can be accessed with AREF.  Storage is done via SETF of an AREF.
The term VECTOR refers to any array of exactly one dimension.
Vectors are special, in that they are also sequences, and can be
referenced by ELT.  Also, only vectors can have fill pointers.

Vectors can be specialized along several distinct axes.  The first is by
the type of the elements, as specified by the :ELEMENT-TYPE keyword to
MAKE-ARRAY.  A vector whose element-type is STRING-CHAR is referred to
as a STRING.  Strings, when they print, use the "..." syntax; they also
are the legal inputs to a family of string-functions, as defined in the
manual.  A vector whose element-type is BIT (alias (MOD 2)), is a
BIT-VECTOR.  These are special because they form the set of legal inputs
to the boolean bit-vector functions.  (We might also want to print them
in a strange way -- see below.)

Some implementations may provide a special, highly efficient
representation for simple vectors.  A simple vector is (of course) 1-D,
cannot have a fill pointer, cannot be displaced, and cannot be altered
in size after its creation.  To get a simple vector, you use the :SIMPLE
keyword to MAKE-ARRAY with a non-null value.  If there are any
conflicting options specified, an error is signalled.  If an
implementation does not support simple vectors, this keyword/value is
ignored except that the error is still signalled on inconsistent cases.

We need a new set of type specifiers for simple things: SIMPLE-VECTOR,
SIMPLE-STRING, and SIMPLE-BIT-VECTOR, with the corresponding
type-predicate functions.  Simple vectors are referenced by AREF in the
usual way, but the user may use THE or DECLARE to indicate at
compile-time that the argument is simple, with a corresponding increase
in efficiency.  Implementations that do not support simple vectors
ignore the "simple" part of these declarations.

Strings (simple or non-simple) self-eval; all other arrays cause an
error when passed to EVAL.  EQUAL descends into strings, but not
into any other arrays.  EQUALP descends into arrays of all kinds,
comparing the corresponding elements with EQUALP.  EQUALP is false
if the array dimensions are not the same, but it is not sensitive to
the element-type of the array, whether it is simple, etc.  In comparing
the dimensions of vectors, EQUALP uses the length from 0 to the fill
pointer; it does not look at any elements beyond the fill pointer.

The set of type-specifiers required for all of this is ARRAY, VECTOR,
STRING, BIT-VECTOR, SIMPLE-VECTOR, SIMPLE-STRING, SIMPLE-BIT-VECTOR.
Each of these has a corresponding type-P predicate, and each can be
specified in list from, along with the element-type and dimension(s).

MAKE-ARRAY takes the following keywords: :ELEMENT-TYPE, :INITIAL-VALUE,
:INITIAL-CONTENTS, :FILL-POINTER, and :SIMPLE.  There is still some
discussion as to whether we should retain array displacement, which
requires :DISPLACED-TO and :DISPLACED-INDEX-OFFSET.

The following functions are redundant, but should be retained for
clarity and emphasis in code: MAKE-VECTOR, MAKE-STRING, MAKE-BIT-VECTOR.
MAKE-VECTOR takes the same keywords as MAKE-ARRAY, but can only take a
single integer as the dimension argument.  MAKE-STRING and
MAKE-BIT-VECTOR are like MAKE-VECTOR, but do not take the :ELEMENT-TYPE
keyword, since the element-type is implicit.  Similarly, we should
retain the forms VREF, CHAR, and BIT, which are identical in operation
to AREF, but which declare their aray argument to be VECTOR, STRING, or
BIT-VECTOR, respectively.

If the :SIMPLE keyword is not specified to MAKE-ARRAY or related forms,
the default is NIL.  However, vectors produced by random forms such as
CONCATENATE are simple, and vectors created when the reader sees #(...)
or "..." are also simple.

As a general rule, arrays are printed in a simple format that, upon
being read back in, produces a form that is EQUALP to the original.
However, some information may be lost in the printing process:
element-type restrictions, whether a vector is simple, whether it has a
fill pointer, whether it is displaced, and the identity of any element
that lies beyond the fill pointer.  This choice was made to favor ease
of interactive use; if the user really wants to preserve in printed form
some complex data structure containing non-simple arrays, he will have
to develop his own printer.

A switch, SUPPRESS-ARRAY-PRINTING, is provided for users who have lots
of large arrays around and don't want to see them trying to print.  If
non-null, this switch causes all arrays except strings to print in a
short, non-readable form that does not include the elements:
#<array-...>.  In addition, the printing of arrays and vectors (but not
of strings) is subject to PRINLEVEL and PRINLENGTH.

Strings, simple or otherwise, print using the "..."  syntax.  Upon
read-in, the "..." syntax creates a simple string.

Bit-vectors, simple or otherwise, print using the #"101010..." syntax.
Upon read-in, this format produces a simple bit-vector.  Bit vectors do
observe SUPPRESS-ARRAY-PRINTING.

All other vectors print out using the #(...) syntax, observing
PRINLEVEL, PRINLENGTH, and SUPPRESS-ARRAY-PRINTING.  This format reads
in as a simple vector of element-type T.

All other arrays print out using the syntax #nA(...), where n is the
number of dimensions and the list is a nest of sublists n levels deep,
with the array elements at the deepest level.  This form observes
PRINLEVEL, PRINLENGTH, and SUPPRESS-ARRAY-PRINTING.  This format reads
in as an array of element-type T.

Query: I am still a bit uneasy about the funny string-like syntax for
bit vectors.  Clearly we need some way to read these in that does not
turn into a type-T vector.  An alternative might be to allow #(...) to
be a vector of element-type T, as it is now, but to take the #n(...)
syntax to mean a vector of element-type (MOD n).  A bit-vector would
then be #2(1 0 1 0...) and we would have a parallel notation available
for byte vectors, 32-bit word vectors, etc.  The use of the #n(...)
syntax to indicate the length of the vector always struck me as a bit
useless anyway.  One flaw in this scheme is that it does not extend to
multi-D arrays.  Before someone suggests it, let me say that I don't
like #nAm(...), where n is the rank and m is the element-type -- it
would be too hard to remember which number was which.  But even with
this flaw, the #n(...) syntax might be useful.

RPG Memorial Proposal
∂22-Sep-82  2138	Scott E. Fahlman <Fahlman at Cmu-20c> 	Arrays and vectors (again)
Date: Thursday, 23 September 1982  00:38-EDT
From: Scott E. Fahlman <Fahlman at Cmu-20c>
To:   common-lisp at SU-AI
Subject: Arrays and vectors (again)


Several people have stated that they dislike my earlier proposal because
it uses the good names (VECTOR, STRING, BIT-VECTOR, VREF, CHAR, BIT) on
general 1-D arrays, and makes the user say "simple" when he wants one of
the more specialized high-efficiency versions.  This makes extra work
for users, who will want simple vectors at least 95% of the time.  In
addition, there is the argument that simple vectors should be thought of
as a first-class data-type (in implementations that provide them) and
not as a mere degenerate form of array.

Just to see what it looks like, I have re-worked the earlier proposal to
give the good names to the simple forms.  This does not really eliminate
any of the classes in the earlier proposal, since each of those classes
had some attributes or operations that distinguished it from the others.

Since there are getting to be a lot of proposals around, we need some
nomencalture for future discussions.  My first attempt, with the
user-settable :PRINT option should be called the "print-switch"
proposal; the next one, with the heavy use of the :SIMPLE switch should
be the "simple-switch" proposal; this one can be called the "RPG
memorial" proposal.  Let me know what you think about this vs. the
simple-switch version -- I can live with either, but I really would like
to nail this down pretty soon so that we can get on with the
implementation.

**********************************************************************

Arrays can be 1-D or multi-D.  All arrays can be created by MAKE-ARRAY
and can be accessed with AREF.  Storage is done via SETF of an AREF.
1-D arrays are special, in that they are also of type SEQUENCE, and can
be referenced by ELT.  Also, only 1-D arrays can have fill pointers.

Some implementations may provide a special, highly efficient
representation for simple 1-D arrays, which will be of type VECTOR.  A
vector is 1-dimensional, cannot have a fill pointer, cannot be
displaced, and cannot be altered in size after its creation.  To get a
vector, you use the :VECTOR keyword to MAKE-ARRAY with a non-null value.
If there are any conflicting options specified, an error is signalled.
The MAKE-VECTOR form is equivalent to MAKE-ARRAY with :VECTOR T.

A STRING is a VECTOR whose element-type (specified by the :ELEMENT-TYPE
keyword) is STRING-CHAR.  Strings are special in that they print using
the "..." syntax, and they are legal inputs to a class of "string
functions".  Actually, these functions accept any 1-D array whose
element type is STRING-CHAR.  This more general class is called a
CHAR-SEQUENCE. 

A BIT-VECTOR is a VECTOR whose element-type is BIT, alias (MOD 2).
Bit-vectors are special in that they print using the #*... syntax, and
they are legal inputs to a class of boolean bit-vector functions.
Actually, these functions accept any 1-D array whose element-type is
BIT.  This more general class is called a BIT-SEQUENCE.

All arrays can be referenced via AREF, but in some implementations
additional efficiency can be obtained by declaring certain objects to be
vectors, strings, or bit-vectors.  This can be done by normal
type-declarations or by special accessing forms.  The form (VREF v n) is
equivalent to (AREF (THE VECTOR v) n).  The form (CHAR s n) is
equivalent to (AREF (THE STRING s) n).  The form (BIT b n) is equivalent
to (AREF (THE BIT-VECTOR b) n).

If an implementation does not support vectors, the :VECTOR keyword is
ignored except that the error is still signalled on inconsistent cases;
The additional restrictions on vectors are not enforced.  MAKE-VECTOR is
treated just like the equivalent make-array.  VECTORP is true of every
1-D array, STRINGP of every CHAR-SEQUENCE, and BIT-VECTOR of every
BIT-SEQUENCE.

CHAR-SEQUENCEs, including strings, self-eval; all other arrays cause an
error when passed to EVAL.  EQUAL descends into CHAR-SEQUENCEs, but not into
any other arrays.  EQUALP descends into arrays of all kinds, comparing
the corresponding elements with EQUALP.  EQUALP is false if the array
dimensions are not the same, but it is not sensitive to the element-type
of the array, whether it is a vector, etc.  In comparing the dimensions of
vectors, EQUALP uses the length from 0 to the fill pointer; it does not
look at any elements beyond the fill pointer.

The set of type-specifiers required for all of this is ARRAY, VECTOR,
STRING, BIT-VECTOR, SEQUENCE, CHAR-SEQUENCE, and BIT-SEQUENCE.
Each of these has a corresponding type-P predicate, and each can be
specified in list from, along with the element-type and dimension(s).

MAKE-ARRAY takes the following keywords: :ELEMENT-TYPE, :INITIAL-VALUE,
:INITIAL-CONTENTS, :FILL-POINTER, :DISPLACED-TO, :DISPLACED-INDEX-OFFSET,
and :VECTOR.

The following functions are redundant, but should be retained for
clarity and emphasis in code: MAKE-VECTOR, MAKE-STRING, MAKE-BIT-VECTOR.
MAKE-VECTOR takes a single length argument, along with :ELEMENT-TYPE,
:INITIAL-VALUE, and :INITIAL-CONTENTS.  MAKE-STRING and MAKE-BIT-VECTOR
are like MAKE-VECTOR, but do not take the :ELEMENT-TYPE keyword, since
the element-type is implicit.

If the :VECTOR keyword is not specified to MAKE-ARRAY or related forms,
the default is NIL.  However, sequences produced by random forms such as
CONCATENATE are vectors.

Strings always are printed using the "..." syntax.  Bit-vectors always
are printed using the #*... syntax.  Other vectors always print using
the #(...) syntax.  Note that in the latter case, any element-type
restriction is lost upon readin, since this form always produces a
vector of type T when it is read.  However, the new vector will be
EQUALP to the old one.  The #(...) syntax observes PRINLEVEL,
PRINLENGTH, and SUPPRESS-ARRAY-PRINTING.  The latter switch, if non-NIL,
causes the array to print in a non-readable form: #<ARRAY...>.

CHAR-SEQUENCEs print out as though they were strings, using the "..."
syntax.  BIT-SEQUENCES print out as BIT-STRINGS, using the #*... syntax.
All other arrays print out using the #nA(...) syntax, where n is the
number of dimensions and the list is actually a list of lists of lists,
nested n levels deep.  The array elements appear at the lowest level.
The #A syntax also observes PRINLEVEL, PRINLENGTH, and
SUPPRESS-ARRAY-PRINTING.  The #A format reads in as a non-displaced
array of element-type T.

Note that when an array is printed and read back in, the new version is
EQUALP to the original, but some information about the original is lost:
whether the original was a vector or not, element type restrictions,
whether the array was displaced, whether there was a fill pointer, and
the identity of any elements beyond the fill-pointer.  This choice was
made to favor ease of interactive use; if the user really wants to
preserve in printed form some complex data structure containing more
complex arrays, he will have to develop his own print format and printer.

Moon's Proposal
∂29-Sep-82  2349	MOON at SCRC-TENEX 	arrays and vectors  (long carefully-thought-out message)    
Date: Thursday, 30 September 1982  01:59-EDT
From: MOON at SCRC-TENEX
To:   common-lisp at sail
Subject: arrays and vectors  (long carefully-thought-out message)

I prefer the "simple switch" to the "RPG memorial" proposal, with one
modification to be found below.  The reason for this preference is that
it makes the "good" name, STRING for example, refer to the general class
of objects, relegating the efficiency decision to a modifier ("simple").
The alternative makes the efficiency issue too visible to the casual user,
in my opinion.  You have to always be thinking "do I only want this to
work for efficient strings, which are called strings, or should it work
for all kinds of strings, which are called arrays of characters?".
Better to say, "well this works for strings, and hmm, is it worth
restricting it to simple-strings to squeeze out maximal efficiency"?

Lest this seem like I am trying to sabotage the efficiency of Lisp
implementations that are stuck with "stock" hardware, consider the
following:

In the simple switch proposal, how is (MAKE-ARRAY 100) different from
(MAKE-ARRAY 100 :SIMPLE T)?  In fact, there is only one difference--it is
an error to use ADJUST-ARRAY-SIZE on the latter array, but not on the
former.  Except for this, simpleness consists, simply, of the absence of
options.  This suggests to me that the :SIMPLE option be flushed, and
instead a :ADJUSTABLE-SIZE option be added (see, I pronounce the colons).
Even on the Lisp machine, where :ADJUSTABLE-SIZE makes no difference, I
think it would be an improvement, merely for documentation purposes.  Now
everything makes sense: if you don't ask for any special features in your
arrays, you get simple ones, which is consistent with the behavior of the
sequence functions returning simple arrays always.  And if some
implementation decides they need the sequence functions to return
non-simple arrays, they can always add additional keywords to them to so
specify.  The only time you need to know about the word "simple" at all is
if you are making type declarations for efficiency, in which case you have
to decide whether to declare something to be a STRING or a SIMPLE-STRING.
And it makes sense that the more restrictive declaration be a longer word.
This also meets RPG's objection, which I think boils down to the fact
that he thought it was stupid to have :SIMPLE T all over his programs.
He was right.

I'm fairly sure that I don't understand the portability issues that KMP
brought up (I don't have a whole lot of time to devote to this).  But I
think that in my proposal STRINGP and SIMPLE-STRINGP are never the same
in any implementation; for instance, in the Lisp machine STRINGP is true
of all strings, while SIMPLE-STRINGP is only true of those that do not
have fill-pointers.  If we want to legislate that the :ADJUSTABLE-SIZE
option is guaranteed to turn off SIMPLE-STRINGP, I expect I can dig up
a bit somewhere to remember the value of the option.  This would in fact
mean that simple-ness is a completely implementation-independent concept,
and the only implementation-dependence is how much (if any) efficiency
you gain by using it, and how much of that efficiency you get for free
and how much you get only if you make declarations.

Perhaps the last sentence isn't obvious to everyone.  On the LM-2 Lisp
machine, a simple string is faster than a non-simple string for many
operations.  This speed-up happens regardless of declarations; it is a
result of a run-time dispatch to either fast microcode or slow microcode.
On the VAX with a dumb compiler and no tuning, a simple string is only
faster if you make declarations.  On the VAX with a dumb compiler but some
obvious tuning of sequence and string primitives to move type checks out of
inner loops (making multiple copies of the inner loop), simple strings are
faster for these operations, but still slow for AREF unless you make a type
declaration.  On the VAX with a medium-smart compiler that does the same
sort of tuning on user functions, simple strings are faster for user
functions, too, if you only declare (OPTIMIZE SPEED) [assuming that the
compiler prefers space over speed by default, which is the right choice in
most implementations], and save space as well as time if you go whole hog
and make a type declaration.  On the 3600 Lisp machine, you have sort of a
combination of the first case and the last case.

I also support the #* syntax for bit vectors, rather than the #" syntax.
It's probably mere temporal accident that the simple switch proposal
uses #" while the RPG memorial proposal uses #*.

To sum up:

A vector is a 1-dimensional array.  It prints as #(foo bar) or #<array...>
depending on the value of a switch.

A string is a vector of characters.  It always prints as "foo".  Unlike
all other arrays, strings self-evaluate and are compared by EQUAL.

A bit-vector is a vector of bits.  It always prints as #*101.  Since as
far as I can tell these are redundant with integers, perhaps like integers
they should self-evaluate and be compared by EQUAL.  I don't care.

A simple-vector, simple-string, or simple-bit-vector is one of the above
with none of the following MAKE-ARRAY (or MAKE-STRING) options specified:

	:FILL-POINTER
	:ADJUSTABLE-SIZE
	:DISPLACED-TO
	:LEADER-LENGTH, :LEADER-LIST (in implementations that offer them)

There are type names and predicates for the three simple array types.  In
some implementations using the type declaration gets you more efficient
code that only works for that simple type, which is why these are in the
language at all.  There are no user-visible distinctions associated with
simpleness other than those implied by the absence of the above MAKE-ARRAY
options.